Combining Entropy Based Heuristics with Minimax Search and Temporal Differences to Play Hidden State Games

نویسندگان

  • Gregory Calbert
  • Hing-Wah Kwok
چکیده

In this paper, we develop a method for playing variants of spatial games like chess or checkers, where the state of the opponent is only partially observable. Each side has a number of hidden pieces invisible to opposition. An estimate of the opponent state probability distribution is made assuming moves are made to maximize the entropy of subsequent state distribution or belief. The belief state of the game at any time is specified by a probability distribution over opponent’s states and conditional on one of these states, a distribution over our states, this being the estimate of our opponent’s belief of our state. With this, we can calculate the relative uncertainty or entropy balance. We use this information balance along with other observable features and belief-based min-max search to approximate the partially observable Q-function. Gradient decent is used to learn advisor weights.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TDLeaf( ): Combining Temporal Difference Learning with Game-Tree Search

ABSTRACT In this paper we present TDLeaf( ), a variation on the TD( ) algorithm that enables it to be used in conjunction with minimax search. We present some experiments in both chess and backgammon which demonstrate its utility and provide comparisons with TD( ) and another less radical variant, TDdirected( ). In particular, our chess program, “KnightCap,” used TDLeaf( ) to learn its evaluati...

متن کامل

Comparing Minimax and Product in a Variety

This paper describes comparisons of the minimax backup rule and the product back-up rule on a wide variety of games, including P-games, G-games, three-hole kalah, Othello, and Ballard’s incremental game. In three-hole kalah, the product rule plays better than a minimax search to the same depth. This is a remarkable result, since it is the first widely known game in which product has been found ...

متن کامل

Simulation Control in General Game Playing Agents

The aim of General Game Playing (GGP) is to create intelligent agents that can automatically learn how to play many different games at an expert level without any human intervention. One of the main challenges such agents face is to automatically learn knowledge-based heuristics in realtime, whether for evaluating game positions or for search guidance. In recent years, GGP agents that use Monte...

متن کامل

A minimax search algorithm for CDHMM based robust continuous speech recognition

In this paper, we propose a novel implementation of a minimax decision rule for continuous density hidden Markov model based robust speech recognition. By combining the idea of the minimax decision rule with a normal Viterbi search, we derive a recursive minimax search algorithm, where the minimax decision rule is repetitively applied to determine the partial paths during the search procedure. ...

متن کامل

A Simulation-Based General Game Player

The aim of General Game Playing (GGP) is to create intelligent agents that can automatically learn how to play many different games at an expert level without any human intervention. The traditional design model for GGP agents has been to use a minimax-based game-tree search augmented with an automatically learned heuristic evaluation function. The first successful GGP agents all followed that ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004